Searching Large Textual Dataset With Limited Computational Resources
نویسنده
چکیده
In this paper we propose a search approach that can process large volumes of textual data efficiently and effectively even in environments where computational resources are limited. The traditional search solution for large collections assumes availability of practically unlimited computational resources. For many applications and organization this assumption is not realistic. Empirical evaluation of the proposed approach using some of the largest available datasets demonstrates that the proposed search approach is substantially more efficient than the existing approach, is on par if not better in terms of effectiveness, and can operate using very few computational resources. AUDIENCE: [Information Retrieval/Large-scale Search] [Text Processing] [Advanced technical talk]
منابع مشابه
The Logic and Discovery of Textual Allusion
We describe here a method for discovering imitative textual allusions in a large collection of Classical Latin poetry. In translating the logic of literary allusion into computational terms, we include not only traditional IR variables such as token similarity and ngrams, but also incorporate a comparison of syntactic structure as well. This provides a more robust search method for Classical la...
متن کاملA cognitive computational model of eye movements investigating visual strategies on textual material
This article presents a computational model of the visual strategies involved in processing textual material. An experiment is presented in which participants performed different tasks on a multi-paragraph page (searching a target word, searching the most relevant paragraph according to a goal, memorizing paragraphs). The proposed model predicts eye movements based on 5 parameters. The weightin...
متن کاملRomanian Linguistic Resources On Very Large Scale
This paper suggests a methodology for building a technological environment for linguistic processing, intended to conserve, update and exploit, for research, for public and for commercial purposes, strategic linguistic resources of the Romanian language, rooted in textual data contributed daily and in the long run by important editorial houses and mass-media institutions. In essence, it describ...
متن کاملBuilding a Large-Scale Repository of Textual Entailment Rules
Entailment rules are rules where the left hand side (LHS) specifies some knowledge which entails the knowledge expressed in the RHS of the rule, with some degree of confidence. Simple entailment rules can be combined in complex entailment chains, which in turn are at the basis of entailment-based reasoning, which has been recently proposed as a pervasive and application independent approach to ...
متن کاملIntegrating Textual and Model-Based Process Descriptions for Comprehensive Process Search
Documenting business processes using process models is common practice in many organizations. However, not all process information is best captured in process models. Hence, many organizations complement these models with textual descriptions that specify additional details. The problem with this supplementary use of textual descriptions is that existing techniques for automatically searching p...
متن کامل